
ANR: 957881
The forex market plays a key role in daily management of multinational companies operating in different currencies. Its volume surpasses any other market, exceeding the amount of $5 tn per day. Its global centre is located in London, where around 40 per cent of all transactions are executed.
International trade and global investing nourish and heavily rely on it. Its functioning is critical in order to support imports and exports, and consequently to incentive the exchange of resources and the creation of additional demand of goods and services. Without its current liquidity companies potential would be limited and global economic growth would be damaged.
With this said, investors also benefit from the foreign exchange market. Need of diversification comes with the necessity of currency exchange in many occasions to buy and sell foreign assets and/or securities. Equally, some investors may see currencies as an asset class itself and trade them to generate returns. The latter is precisely what this assignment tries to obtain. Nevertheless, a good prediction of FX rates is of great help to multinational companies entering exchange risks so that they can have an effective hedging to currencies' volatility.
Interactions between dealers occurs in a global over-the-counter (OTC) network that connects buyers and sellers. There is no single exchange, however, there is a big interconnectivity between marketplaces. A result of such structure is the difference in rates between banks or market makers that leaves room for arbitrage opportunities, specially in low volume currencies. In this assignment we assume the rates obtained from the data-sources are available prices for those exchanges and that the real application of the results would depend on the platform we trade, and, consequently, on its prices.
In terms of currency distribution in the FX markets:
| Rank | Currency | Share |
| 1 | |
87.6% |
| 2 | |
31.4% |
| 3 | |
21.6% |
| 4 | |
12.8% |
| 5 | |
6.9% |
Source: Triennial Central Bank Survey. Basel, Switzerland: Bank for International Settlements. 2016.
Note: As currencies are pairs, total is 200%.
Its fluctuations are caused by factors across countries such as changes in GDP growth, inflation, interest rates, budget and trade deficits or surpluses, subtantial M&A activity and other macroeconomic conditions.
In the analysis to come, we will briefly introduce the classical macroecomic approaches of Relative Purchasing Power Parity (PPP) and the International Fisher effect for interest rates.
PPP theory says that exchange rates are equal to purchasing power ratios between countries. This means that a fall in purchasing power in one currency would lead to a decrease in that currency's price in the FX market. Therefore, we could expect that at par, the product's value would be the same in any currency terms as the exchange rates would adjust to offset differences.
This might be true for easily and cheaply transportable commodities, nonetheless, it does not hold for other goods and services due to distortion of parity caused by transportation costs.
$$S=\frac{P_{1}}{P_{2}}$$
Where:
From this approach, we can evaluate it this theory to see overpriced or underpriced currencies by measuring different baskets of goods.
A classical example of a PPP index is the one from The Economist:

The Big Mac index helps determine over and under priced currencies with respect the US Dollar. It follows PPP theory and argues that if a Big Mac costs $P_{USD}$ in the US and $P_{GBP}$ in the UK then the exchange $USD/GBP$ will be $PPP\space S_{USD/GBP}=\frac{P_{USD}}{P_{GBP}}$. For any $Market\space S_{USD/GBP}>PPP S_{USD/GBP}$ we would find the British pound overvalued.
So far, all discussion has been focused on the Absolute PPP. However, in order to predict movement in the exchange rate we have to add variability in prices for the goods and services in the countries. That is why we introduce the Relative PPP, which predicts relationship between inflation rates and the exchange rate.
The reasoning is that the movement in market rates is equal to the ratio of inflation rates. Hence:
$$\frac{S^{USD/GBP}_{t}}{S^{USD/GBP}_{t-1}}=\frac{{P^{USD}_{t}}/{P^{USD}_{t-1}}}{{P^{GBP}_{t}}/{P^{GBP}_{t-1}}}$$Therefore if inflation in the UK is larger than inflation in the US, the British Pound would be expected to depreciate to maintain international purchasing parity. We can observe the consequences of the movement as individuals, we can buy product X in the US ($X^{US}$) or in the UK ($X^{UK}$). And let's assume the exchange rate is such that the price in terms of Pounds is the same:
At $t$:
$$X^{US}/S^{USD/GBP}=X^{US \rightarrow GBP} \space\space\space\space\space X^{US \rightarrow GBP}=X^{GBP}$$At ${t+\scriptsize 1}$ there is inflation in the US but prices are fixed in the UK:
$$\uparrow X^{US}\space\space\space S^{USD/GBP}\rightarrow ?$$If $\uparrow S^{USD/GBP}$ and its rise is proportional to the inflation impact then previous balance of $X^{US \rightarrow GBP}=X^{GBP}$ is conserved.
On the contrary, if $\downarrow S^{USD/GBP}$ and USD gets stronger:
$$\uparrow X^{US}/\downarrow S^{USD/GBP}=\uparrow X^{US \rightarrow GBP} \space\space\space\space\space X^{US \rightarrow GBP}>X^{GBP}$$In this situation British citizens purchasing power is damaged with respect with the US. Such change of prices and rates would affect negatively the US exports and would improve the ones from the UK in the two countries' trade. Thus, testing the significance of inflation differentials between pairs is one the techniques in order to help predict next period's FX rate.
What Fisher's hypothesis suggests is that differences in nominal interest rates will explain changes in spot rates in the FX market. We first need to assume that the interest rate parity holds. This no-arbitrage condition determines that investors will be indifferent to deposit in the home country or in any foreign one as the difference in rates of lending and depositing will be offset by changes in the foreign exchange rate.
When relating Fisher's theory that inflation is equal to the difference of nominal rates with real rates with uncovered interest rate parity reasoning that no profit can be made by carry trading currencies we obtain the following:
$$\frac{E(S^{USD/GBP}_{t+1})}{S^{USD/GBP}_{t}} - 1= \frac{i_{US}-i_{GB}}{1+i_{GB}}=E(\xi )$$ Where:
- $E(S^{USD/GBP}_{t+1})$ is the expected future spot rate for the pair USD/GBP.
- $S^{USD/GBP}_{t}$ is today's rate.
- $i_{US}$ is the nominal rate in the US.
- $i_{GB}$ is the nominal rate in Great Britain.
- $E(\xi )$ is the expected rate of change in the FX rate.
As we have seen, both interest rates and inflation play an important role in international finance theories. Therefore we may expect some influence on our goal to forecast future changes in the rate.
import pandas_datareader as pdr
import numpy as np
import datetime
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
start=datetime.datetime(1980,1,1)
end=datetime.datetime(2016,12,31)
#Defining data to download and readable names to change afterwards
tickers = ['DEXJPUS','DEXUSEU','DEXUSUK']
change_names_fx=['JPYUSD','USDEUR','USDGBP']
tickers_funda=['BPBLTT01EZQ637N','BPBLTT01JPQ637N','BPBLTT01GBQ637N','BPBLTT01USQ636N','INTDSRJPM193N',
'IRLTLT01JPM156N','INTDSREZQ193N','IRLTLT01EZM156N','FEDFUNDS','DGS10','INTDSRGBM193N',
'IRLTLT01GBM156N','FPCPITOTLZGJPN','FPCPITOTLZGEMU','FPCPITOTLZGGBR','FPCPITOTLZGUSA']
change_names_funda=['CB EA','CB JP','CB UK','CB US','Rate JP','10Y JP','Rate EUR','10Y EUR',
'Rate US','10Y US','Rate UK','10Y UK','Infl JP','Infl EUR','Infl UK','Infl US']
#We define the function to take data from FRED
def get_data_from_fred(start,end,tickers,tickers_funda):
main_df=pd.DataFrame()
#We loop in tickers
for n in range(0,len(tickers)):
ticker_compile = pdr.DataReader(tickers[n], 'fred',start, end)
main_df=main_df.join(ticker_compile,how='outer')
if len(tickers_funda)>0:
for n in range(0,len(tickers_funda)):
ticker_compile = pdr.DataReader(tickers_funda[n], 'fred',start, end)
main_df=main_df.join(ticker_compile,how='outer')
main_df.to_csv('Data_joined.csv')
get_data_from_fred(start,end,tickers,tickers_funda)
import math
#Load the file with the data into a DataFrame
main_df=pd.read_csv('Data_joined.csv',index_col=0)
#Replace null values with average of previous and next value
for n in range(len(tickers)):
for z in range(len(main_df.iloc[:,n])):
if z==0:
if(pd.isnull(main_df.iloc[z,n])):
main_df.iloc[z,n]=main_df.iloc[z+1,n]
if z>0:
if(pd.isnull(main_df.iloc[z,n])):
main_df.iloc[z,n]=(main_df.iloc[z-1,n]+main_df.iloc[z+1,n])/2
#Replace missing values of data that is issued less frequently
for n in range(len(tickers),len(tickers)+len(tickers_funda)):
for z in range(len(main_df.iloc[:,n])):
if(pd.isnull(main_df.iloc[z,n])):
main_df.iloc[z,n]=main_df.iloc[z-1,n]
#Replace names into a readable format and calculate rates from USD perspective
main_df.columns=change_names_fx+change_names_funda
main_df['USDEUR']=1/main_df['USDEUR']
main_df['USDGBP']=1/main_df['USDGBP']
main_df.rename(columns={'USDEUR':'EURUSD','USDGBP':'GBPUSD'},inplace=True)
#Calculate USD strength against JPY and GBP
main_df['Trend JP']=0
main_df['Trend GBP']=0
for x in range(0,len(main_df['EURUSD'])):
main_df['Trend JP'].iloc[x]=(main_df['JPYUSD'].iloc[x]/main_df['JPYUSD'].iloc[0])
main_df['Trend GBP'].iloc[x]=(main_df['GBPUSD'].iloc[x]/main_df['GBPUSD'].iloc[0])
#Extra formatting for EUR (data from 2000 on), same as before, strength of USD against EUR
main_df_EUR=main_df['2000-01-01'::].copy()
main_df_EUR['Trend EUR']=0
for x in range(0,len(main_df_EUR['EURUSD'])):
main_df_EUR['Trend EUR'].iloc[x]=(main_df_EUR['EURUSD'].iloc[x]/main_df_EUR['EURUSD'].iloc[0])
#Formatting for Q and Y frequencies
main_df.index = pd.to_datetime(main_df.index)
main_df_Q=main_df.resample('Q').mean()
main_df_Q=main_df_Q.reset_index()
main_df_Q['DATE'] = main_df_Q['DATE'].dt.to_period("Q")
main_df_Q=main_df_Q.set_index('DATE')
main_df_Y=main_df.resample('A').mean()
main_df_Y=main_df_Y.reset_index()
main_df_Y['DATE'] = main_df_Y['DATE'].dt.to_period("A")
main_df_Y=main_df_Y.set_index('DATE')
#EUR
main_df_EUR.index = pd.to_datetime(main_df_EUR.index)
main_df_Q_EUR=main_df_EUR.resample('Q').mean()
main_df_Q_EUR=main_df_Q_EUR.reset_index()
main_df_Q_EUR['DATE'] = main_df_Q_EUR['DATE'].dt.to_period("Q")
main_df_Q_EUR=main_df_Q_EUR.set_index('DATE')
main_df_Y_EUR=main_df_EUR.resample('A').mean()
main_df_Y_EUR=main_df_Y_EUR.reset_index()
main_df_Y_EUR['DATE'] = main_df_Y_EUR['DATE'].dt.to_period("A")
main_df_Y_EUR=main_df_Y_EUR.set_index('DATE')
Note:Index at 1.
#Plot strength
import matplotlib.pyplot as plt
main_df['Trend EUR']=main_df_EUR['Trend EUR']
import plotly.plotly as py
import plotly
import plotly.graph_objs as go
from datetime import datetime
import pandas_datareader.data as web
line1 = go.Scatter(
x = main_df_EUR.index,
y = main_df_EUR['Trend EUR'],
name = "EUR",
)
line2 = go.Scatter(
x = main_df.index,
y = main_df['Trend GBP'],
name = "GBP",
)
line3 = go.Scatter(
x = main_df.index,
y = main_df['Trend JP'],
name = "JP",
)
data = [line1,line2,line3]
layout = dict(
title='USD strength',
xaxis=dict(
rangeselector=dict(
buttons=list([
dict(count=1,
label='YTD',
step='year',
stepmode='todate'),
dict(count=1,
label='1y',
step='year',
stepmode='backward'),
dict(step='all')
])
),
rangeslider=dict(),
type='date'
)
)
fig = dict(data=data, layout=layout)
py.iplot(fig)
#Calculation of differentials across currencies
###EUR###
main_df_Y_EUR['% Change EURUSD']=0
main_df_Y_EUR['Dif Rates USD-EUR']=0
main_df_Y_EUR['Dif Inf USD-EUR']=0
for x in range(0,len(main_df_Y_EUR['EURUSD'])):
if x==(len(main_df_Y_EUR['EURUSD'])-1):
main_df_Y_EUR['% Change EURUSD'].iloc[x]=0
else:
#Calculate change happened that year with those years' variables
main_df_Y_EUR['% Change EURUSD'].iloc[x]=(main_df_Y_EUR['EURUSD'].iloc[x+1]/main_df_Y_EUR['EURUSD'].iloc[x])-1
main_df_Y_EUR['Dif Rates USD-EUR'].iloc[x]=(main_df_Y_EUR['Rate US'].iloc[x]-main_df_Y_EUR['Rate EUR'].iloc[x])
#Inflation values in that year are known from last year (we do not know current inflation so t-1)
main_df_Y_EUR['Dif Inf USD-EUR'].iloc[x]=(main_df_Y_EUR['Infl US'].iloc[x-1]-main_df_Y_EUR['Infl EUR'].iloc[x-1])
###JPY###
main_df_Y['% Change JPYUSD']=0
main_df_Y['Dif Rates USD-JP']=0
main_df_Y['Dif Inf USD-JP']=0
###Machine learning analysis###
main_df_Y['Dif 10Y Rates USD-JP']=0
main_df_Y['Dif CB USD-JP']=0
for x in range(0,len(main_df_Y['JPYUSD'])):
if x==(len(main_df_Y['JPYUSD'])-1):
main_df_Y['% Change JPYUSD'].iloc[x]=0
else:
#Calculate change happened that year with those years' variables
main_df_Y['% Change JPYUSD'].iloc[x]=(main_df_Y['JPYUSD'].iloc[x+1]/main_df_Y['JPYUSD'].iloc[x])-1
main_df_Y['Dif Rates USD-JP'].iloc[x]=(main_df_Y['Rate US'].iloc[x]-main_df_Y['Rate JP'].iloc[x])
main_df_Y['Dif 10Y Rates USD-JP'].iloc[x]=(main_df_Y['10Y US'].iloc[x]-main_df_Y['10Y JP'].iloc[x])
#Inflation values in that year are known from last year (we do not know current inflation so t-1)
main_df_Y['Dif Inf USD-JP'].iloc[x]=(main_df_Y['Infl US'].iloc[x-1]-main_df_Y['Infl JP'].iloc[x-1])
main_df_Y['Dif CB USD-JP'].iloc[x]=(main_df_Y['CB US'].iloc[x-1]-main_df_Y['CB JP'].iloc[x-1])
###GBP###
main_df_Y['% Change GBPUSD']=0
main_df_Y['Dif Rates USD-UK']=0
main_df_Y['Dif Inf USD-UK']=0
for x in range(0,len(main_df_Y['GBPUSD'])):
if x==(len(main_df_Y['GBPUSD'])-1):
main_df_Y['% Change GBPUSD'].iloc[x]=0
else:
#Calculate change happened that year with those years' variables
main_df_Y['% Change GBPUSD'].iloc[x]=(main_df_Y['GBPUSD'].iloc[x+1]/main_df_Y['GBPUSD'].iloc[x])-1
main_df_Y['Dif Rates USD-UK'].iloc[x]=(main_df_Y['Rate US'].iloc[x]-main_df_Y['Rate UK'].iloc[x])
#Inflation values in that year are known from last year (we do not know current inflation so t-1)
main_df_Y['Dif Inf USD-UK'].iloc[x]=(main_df_Y['Infl US'].iloc[x-1]-main_df_Y['Infl UK'].iloc[x-1])
#EURUSD regression Rates and Inflation
Reg_EURUSD_Y_Dependent=main_df_Y_EUR[0:-1]['% Change EURUSD']
Reg_EURUSD_Y_Variables=main_df_Y_EUR[0:-1][['Dif Rates USD-EUR','Dif Inf USD-EUR']]
import statsmodels.api as sm
Reg_EURUSD_Y_Variables=sm.add_constant(Reg_EURUSD_Y_Variables)
result_EURUSD=sm.OLS(Reg_EURUSD_Y_Dependent,Reg_EURUSD_Y_Variables).fit()
#Regression JPYUSD
Reg_JPYUSD_Y_Dependent=main_df_Y[0:-1]['% Change JPYUSD']
Reg_JPYUSD_Y_Variables=main_df_Y[0:-1][['Dif Rates USD-JP','Dif Inf USD-JP']]
Reg_JPYUSD_Y_Variables=sm.add_constant(Reg_JPYUSD_Y_Variables)
result_JPYUSD=sm.OLS(Reg_JPYUSD_Y_Dependent,Reg_JPYUSD_Y_Variables).fit()
#GBPUSD Regression, unfortunately we only have all data for both countries from 1990
Reg_GBPUSD_Y_Dependent=main_df_Y['1990'::]['% Change GBPUSD']
Reg_GBPUSD_Y_Variables=main_df_Y['1990'::][['Dif Rates USD-UK','Dif Inf USD-UK']]
Reg_GBPUSD_Y_Variables=sm.add_constant(Reg_GBPUSD_Y_Variables)
result_GBPUSD=sm.OLS(Reg_GBPUSD_Y_Dependent,Reg_GBPUSD_Y_Variables).fit()
print(result_EURUSD.summary())
print(result_JPYUSD.summary())
print(result_GBPUSD.summary())
We only observe the inflation differential of the JPY/USD pair as significant factor in our regressions. It has positive sign, consequently when inflation in the US is higher than in Japan we expect the Dollar to get stronger with respect the Yen.
How can we explain this deviation from theoretical frameworks?
Market fluctuations are not always a reflection of fundamentals, eventually, those movements are expectations that are projected to the underlying tradable good.
That is why, when a country experiences inflation, like in this case the US, investors might expect central bank to raise rates and thus, will increase demand for US dollars. The long term empirical evidence shows how positive inflation differentials will depreciate the currency, following purchasing power theory.
Technical analysis is used to forecast direction of prices from past data through its prices, ratios, moving averages, volumes and chartism among others. It follows behavioral economics theories that defend the seeking of patterns biases investors from efficient unpredictable pricing and gives room for pattern-recognition strategies. If a large enough amount of investors believe in technical analysis then it will be present in trading strategies and thus, should work if predicted correctly.
There is an extensive offer of techniques, the most famous ones are the chart patterns as the head and shoulders double top or bottom rebound, the crossing moving averages and lines of support, resistance or channels.
Technical analysis attempts to confine the psychology of crowds and the cycles of optimism and fear. It relies on the fact that history repeats itself in the markets and that it is possible to read those trends.
Example of moving averages analysis.
Nevertheless, pattern recognition is being greatly overcome by machines nowadays. The purpose of our analysis is to test how several machine learning algorithms would behave when facing the task of detecting pattern in our data. The data will be exclusively daily average Ask and Bid prices for the JPY/USD and the EUR/USD pairs. Equally, the simulations for back-testing will be performed from the point of view of a US investor.
Trading horizon is daily, having to decide whether we hold USD, we buy JPY(EUR) or we sell JPY(EUR). For that, we provided the algorithm with training data before performing with unseen backtests. In this training data, we implement a signal feature with the historical return of t+3. This way, by observing what kind of trend was happening for the different outcomes in the next three days, the algorithm will be able to relate it when in the test data, assuring a longer term trend.
Following, we will explain how the process works in detail:
We start by downloading the data from Quandl and calculating and plotting the returns.
import quandl
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.core.display import display
from collections import Counter
%matplotlib inline
df_technical=quandl.get("PERTH/USD_JPY_D", authtoken="Z-5Q1MtTbYxMUDfrvVJw",end_date="2015-12-31")
#Define and calculate nominal return and normalized return
df_technical['Return Nominal'] = df_technical['Bid Average']/((df_technical['Ask Average']).shift(-1))-1
df_technical['Return'] = df_technical['Bid Average'] - (df_technical['Ask Average'].shift(-1))
return_rescale = df_technical['Return'].max() - df_technical['Return'].min()
df_technical['Return'] = df_technical['Return'] / return_rescale
#Return from Dollar perspective
plt.style.use('fivethirtyeight')
ax=df_technical.plot(y='Return Nominal', figsize=(10,4),lw=1,ylim=(-0.06,0.06))
ax.set_title("JPY Return")
ax.set_xlabel("Date")
ax.set_ylabel("Daily %")
plt.show()
We see how the Asian crisis in 1997 and the Financial Crisis in 2008 affect the volatility of returns. Next, we need to create our features, in the DataFrame under here, we create a dummy per row indicating whether the JPY is going to be stronger next day, in that case we want to buy JPY today to enjoy the return in USD.
# We create a signal with dummy variables of profit or loss.
#We shift Return one row above
df_technical['Dummy'] = df_technical['Return']
#Return positive means stronger JPY, therefore we want to be holding JPY before that moment in order to achieve
#profits in USD
df_technical['Dummy'] = df_technical['Dummy'].apply(lambda x: 1 if x>0.0 else 0)
# df.dropna(inplace=True)
df_technical.head()
Next step is to define the function for the trading orders and the creation of features with returns in next 3 days:
#We define the function to buy or sell. The requirement indicates the return we apply as buying signal
def buy_sell_hold(*args):
cols = [c for c in args]
#Requirement=minimun return to buy or sell.
requirement = 0
for col in cols:
if col > requirement:
return 1
if col < requirement:
return -1
#We calculate returns in a 3 day horizon
hm_days=3
for i in range(1,hm_days+1):
df_technical['Feature_{}d'.format(i)] = (
df_technical['Bid Average']/((df_technical['Ask Average']).shift(-i))-1)
df_technical.fillna(0, inplace=True)
for i in range(1,hm_days+1):
#Map applies the function to all the inputs
#Right now target solely depending on 3 days horizon.
#Potential improvement, trading with different target horizons
df_technical['JPYUSD_target'] = list(map(buy_sell_hold,df_technical['Feature_{}d'.format(i)]))
With the targets ready and the historical returns as only source of training we can start to develop our machine learning:
#Define data to apply machine learning
df_vals = df_technical['Return Nominal']
df_vals = df_vals.replace([np.inf, -np.inf], 0)
df_vals.fillna(0, inplace=True)
df_technical['Real Return Sign'] = df_technical['Return Nominal']
#To check our recall
df_technical['Real Return Sign'] = df_technical['Real Return Sign'].apply(lambda x: 1 if x>0.0 else -1)
X = df_vals.values
y = df_technical['JPYUSD_target'].values
y=np.nan_to_num(y)
train_size = int(len(X) * 0.66)
y_real_labels=df_technical['Real Return Sign'][train_size:len(y)]
X_train, X_test = X[0:train_size], X[train_size:len(X)]
y_train, y_test = y[0:train_size], y[train_size:len(y)]
from sklearn import svm, neighbors
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.metrics import recall_score, precision_score, accuracy_score, confusion_matrix
#We apply ensemble method (voting classifier)
clf = VotingClassifier([('lsvc',svm.LinearSVC()),
('knn',neighbors.KNeighborsClassifier()),
('rfor',RandomForestClassifier())])
clf.fit(X_train.reshape(-1, 1), y_train)
predictions = clf.predict(X_test.reshape(-1, 1))
confidence = clf.score(X_test.reshape(-1, 1), y_test)
display('Confusion matrix:', confusion_matrix(y_real_labels, predictions,labels=[1,-1]))
print('Accuracy 3d Target:',confidence)
print('Accuracy 1d Target: ',accuracy_score(y_real_labels, predictions))
print('Predicted class counts:',Counter(predictions))
print('Real class counts: {}'.format(Counter(y_real_labels)))
print('Recall: ' ,recall_score(y_real_labels, predictions,pos_label=1))
print('Precision: ' ,precision_score(y_real_labels, predictions,pos_label=1))
We apply an ensemble method whose goal is to combine predictions of several base estimators that together provide, through different techniques, a better generalization and robustness compared to a single estimator.
In our case, the method chosen is a Voting Classifier. It combines different classifiers and uses a majority vote to predict the class label. It corrects classifiers' individual weaknesses.
The three classifiers used in the analysis are:
The key metric here is the 'precision' percentage as it indicates how precise we are when buying JPY. We have 72% what means that every time we buy JPY we earn the 72% of the times, very good news. By looking at 'recall' performance we see a lower percentage. What 'recall' means is that from the actual positive instance we are only entering (buying) 55% of the times. Therefore, we let many good deals scape, however, our algorithm is very consistent in reading the positive trends as when it enters it is very certain of the positive returns and also in signaling exits.
A good overview of our classification can be seen from the 'Confusion Matrix':
| Predicted | |||
| 1 (Buy) | 0 (Sell or Hold) | ||
|
Actual
|
1 (Buy) | 495 | 400 |
| 0 (Sell or Hold) | 189 | 1340 | |
Now we are ready to backtest the algorithm with real unseen data.
# Calculate equity..
contracts = 10000.0
commission = 0.0
df_trade = pd.DataFrame(X_test, columns=['Return nominal'])
df_trade['Label'] = y_test
df_trade['Pred'] = predictions
df_trade['Won'] = df_trade['Label'] == df_trade['Pred']
df_trade.drop(df_trade.index[len(df_trade)-1], inplace=True)
#Initial capital 10k USD, we assume we reinvest earnings.
df_trade['Capital']=10000
df_trade['Pos Hit']=""
Capital=10000
curr=''
pos=0
neg=0
for i in range(0,len(df_trade['Pred'])-1):
#Buy signal, buy JPY.
if df_trade['Pred'].iloc[i]==1:
curr='JPY'
#Return if buy/hold JPY:
df_trade['Capital'].iloc[i+1]=df_trade['Capital'].iloc[i]*(1+df_trade['Return nominal'][i])
#Count of profit and loss trades:
if np.sign(df_trade['Return nominal'][i])>0:
pos=pos+1
else:
neg=neg+1
#If sell signal, we sell JPY and hold USD.No return.
elif df_trade['Pred'].iloc[i]==-1 and curr=='JPY':
curr='USD'
df_trade['Capital'].iloc[i+1]=df_trade['Capital'].iloc[i]
#If we hold USD and sell signal we keep holding USD.
elif df_trade['Pred'].iloc[i]==-1 and curr=='USD':
curr='USD'
df_trade['Capital'].iloc[i+1]=df_trade['Capital'].iloc[i]
print('Positive trades: ',pos,'Negative trades: ',neg)
#Profit function if selling contracts. Not used as we analyse the position of a US investor
def calc_profit(row):
if row['Won']:
return abs(row['Return nominal'])*contracts - commission
else:
return -abs(row['Return nominal'])*contracts - commission
df_trade['Pnl'] = df_trade.apply(lambda row: calc_profit(row), axis=1)
#If we could sell and buy 3 day contracts.
df_trade['Equity contracts'] = df_trade['Pnl'].cumsum()
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
return false;
}
#Avoid scrolling output(Above)
#We run the backtesting and plot it in simulation:
import io
import numpy
import matplotlib.pyplot as plt
from matplotlib import animation
import datetime
from numpy import genfromtxt
from IPython.display import HTML
from matplotlib import style
style.use('fivethirtyeight')
x = df_trade.index.tolist()
y = df_trade['Capital'].tolist()
fig, ax = plt.subplots(figsize=(13, 7))
line, = ax.plot([], [], 'b-')
ax.margins(0.05)
plt.xlabel('Trading day')
plt.ylabel('Capital')
plt.title('Backtesting')
def init():
line.set_data(x,y)
return line,
def animate(i):
imin = 0 #min(max(0, i - win), x.size - win)
xdata = x[imin:i+2]
ydata = y[imin:i+2]
line.set_data(xdata, ydata)
ax.relim()
ax.autoscale()
return line,
plt.tight_layout()
anim = animation.FuncAnimation(fig, animate, frames=len(df_trade['Capital']),init_func=init, interval=50)
plt.close()
HTML(anim.to_html5_video())
Note: Now we are going to perform the same process to the EUR. If you want to jump to the final analysis click here.
#Same analysis, against the EUR
df_technical_1=quandl.get("FED/RXI_US_N_B_EU", end_date="2017-12-31")
df_technical_1['Return Nominal'] =0
df_technical_1['Return Nominal'] = df_technical_1/df_technical_1.shift(-1)-1
ax1=df_technical_1.plot(y='Return Nominal', figsize=(10,4),lw=1,ylim=(-0.04,0.04))
ax1.set_title("EUR Return")
ax1.set_xlabel("Date")
ax1.set_ylabel("Daily %")
plt.show()
df_technical_1['Dummy'] = df_technical_1['Return Nominal']
df_technical_1['Dummy'] = df_technical_1['Dummy'].apply(lambda x: 1 if x>0.0 else 0)
hm_days=3
for i in range(1,hm_days+1):
df_technical_1['Feature_{}d'.format(i)] = (
df_technical_1['Return Nominal']/((df_technical_1['Return Nominal']).shift(-i))-1)
df_technical.fillna(0, inplace=True)
for i in range(1,hm_days+1):
df_technical_1['EURUSD_target'] = list(map(buy_sell_hold,df_technical_1['Feature_{}d'.format(i)]))
df_technical_1['EURUSD_target'].value_counts()
df_vals_1 = df_technical_1['Return Nominal']
df_vals_1 = df_vals_1.replace([np.inf, -np.inf], 0)
df_vals_1.fillna(0, inplace=True)
df_technical_1['Real Return Sign'] = df_technical_1['Return Nominal']
df_technical_1['Real Return Sign'] = df_technical_1['Real Return Sign'].apply(lambda x: 1 if x>0.0 else -1)
X1 = df_vals_1.values
y1 = df_technical_1['EURUSD_target'].values
y1=np.nan_to_num(y1)
train_size_1 = int(len(X1) * 0.80)
y_real_labels_1=df_technical_1['Real Return Sign'][train_size_1:len(y1)]
X_train1, X_test1 = X1[0:train_size_1], X1[train_size_1:len(X1)]
y_train1, y_test1 = y1[0:train_size_1], y1[train_size_1:len(y1)]
clf1 = VotingClassifier([('lsvc',svm.LinearSVC()),
('knn',neighbors.KNeighborsClassifier()),
('rfor',RandomForestClassifier())])
clf1.fit(X_train1.reshape(-1, 1), y_train1)
confidence_1 = clf1.score(X_test1.reshape(-1, 1), y_test1)
predictions_1 = clf1.predict(X_test1.reshape(-1, 1))
display('Confusion matrix:', confusion_matrix(y_real_labels_1, predictions_1,labels=[1,-1]))
print('Accuracy 3d Target:',confidence_1)
print('Accuracy 1d Target: ',accuracy_score(y_real_labels_1, predictions_1))
print('Predicted class counts:',Counter(predictions_1))
print('Real class counts: {}'.format(Counter(y_real_labels_1)))
print('Recall: ' ,recall_score(y_real_labels_1, predictions_1,pos_label=1))
print('Precision: ' ,precision_score(y_real_labels_1, predictions_1,pos_label=1))
contracts = 10000.0
commission = 0.0
df_trade1 = pd.DataFrame(X_test1, columns=['Return nominal'])
df_trade1['Label'] = y_test1
df_trade1['Pred'] = predictions_1
df_trade1['Won'] = df_trade1['Label'] == df_trade1['Pred']
df_trade1.drop(df_trade1.index[len(df_trade1)-1], inplace=True)
df_trade1['Capital']=10000
Capital=10000
curr=''
pos1=0
neg1=0
for i in range(0,len(df_trade1['Pred'])-1):
if df_trade1['Pred'].iloc[i]==1:
curr='EUR'
df_trade1['Capital'].iloc[i+1]=df_trade1['Capital'].iloc[i]*(1+df_trade1['Return nominal'][i])
if np.sign(df_trade1['Return nominal'][i])>0:
pos1=pos1+1
else:
neg1=neg1+1
elif df_trade1['Pred'].iloc[i]==-1 and curr=='EUR':
curr='USD'
df_trade1['Capital'].iloc[i+1]=df_trade1['Capital'].iloc[i]
elif df_trade1['Pred'].iloc[i]==-1 and curr=='USD':
curr='USD'
df_trade1['Capital'].iloc[i+1]=df_trade1['Capital'].iloc[i]
print('Positive trades: ',pos1,'Negative trades: ',neg1)
df_trade1['Pnl'] = df_trade1.apply(lambda row: calc_profit(row), axis=1)
df_trade1['Equity contracts'] = df_trade1['Pnl'].cumsum()
Confusion Matrix for EUR
| Predicted | |||
| 1 (Buy) | 0 (Sell or Hold) | ||
|
Actual
|
1 (Buy) | 63 | 425 |
| 0 (Sell or Hold) | 39 | 428 | |
We see a poor accuracy in daily trading, however, we keep a good enough precision (62%) in order to make sustainable returns. As we previously saw with the JPY, the 'recall' is lower, in the case of the EUR is extremely low (13%) what means we are missing most buying opportunities.
Let's see how it performs in the back-test!
#Running backtest against EUR
x1 = df_trade1.index.tolist()
y1 = df_trade1['Capital'].tolist()
fig, ax = plt.subplots(figsize=(13, 7))
line, = ax.plot([], [], 'b-')
ax.margins(0.05)
plt.xlabel('Trading day')
plt.ylabel('Capital')
plt.title('Backtesting')
def init():
line.set_data(x1,y1)
return line,
def animate(i):
imin = 0 #min(max(0, i - win), x.size - win)
xdata = x1[imin:i+2]
ydata = y1[imin:i+2]
line.set_data(xdata, ydata)
ax.relim()
ax.autoscale()
return line,
plt.tight_layout()
anim = animation.FuncAnimation(fig, animate, frames=len(df_trade1['Capital']),init_func=init, interval=50)
plt.close()
HTML(anim.to_html5_video())
# Calculate summary of trades against JPY and EUR.
#Function to round.
from math import ceil, floor
def float_round(num, places = 0, direction = floor):
return direction(num * (10**places)) / float(10**places)
print('#################### JPY Summary ######################################## EUR Summary ####################')
print('__________________________________________________________________________________________________________')
print("Net Profit : ", float_round((df_trade['Capital'].iloc[-1]/df_trade['Capital'].iloc[0])-1,2,round),
" "*int(len(str('########################################'))/2),
"Net Profit : ", float_round((df_trade1['Capital'].iloc[-1]/df_trade1['Capital'].iloc[0])-1,2,round)
)
total_dist=len(str("Net Profit : ")) \
+ len(str(float_round((df_trade['Capital'].iloc[-1]/df_trade['Capital'].iloc[0])-1,2,round))) \
+ len(str(" "*(int(len(str('########################################'))/2)+1)))
print('__________________________________________________________________________________________________________')
print("Number Winning Trades : %d" % pos,
" "*(total_dist-len(str("Number Winning Trades : %d" % pos))),
"Number Winning Trades : %d" % pos1
)
print('__________________________________________________________________________________________________________')
print("Number Losing Trades : %d" % neg,
" "*(total_dist-len(str("Number Losing Trades : %d" % neg))),
"Number Losing Trades : %d" % neg1
)
print('__________________________________________________________________________________________________________')
print("Percent Profitable : %.2f%%" % (100*pos/(pos + neg)),
" "*(total_dist-len(str("Percent Profitable : %.2f%%" % (100*pos/(pos + neg))))),
"Percent Profitable : %.2f%%" % (100*pos1/(pos1 + neg1))
)
print('__________________________________________________________________________________________________________')
len_following=len(str("Yearly return : ")) \
+len(str(float_round((df_trade['Capital'].iloc[-1]/df_trade['Capital'].iloc[0]-1)
/len(df_trade['Capital'])*365,2,round)))
print("Yearly return : " ,float_round((df_trade['Capital'].iloc[-1]/df_trade['Capital'].iloc[0]-1)
/len(df_trade['Capital'])*365,2,round),
" "*(total_dist-len_following-1),
"Yearly return : " ,float_round((df_trade1['Capital'].iloc[-1]/df_trade1['Capital'].iloc[0]-1)
/len(df_trade1['Capital'])*365,2,round)
)
df_trade['Difs']=0
for i in range(0,len(df_trade['Capital'])-1):
if i==0:
df_trade['Difs'].iloc[0]=0
else:
df_trade['Difs'].iloc[i]=(df_trade['Capital'].iloc[i]/df_trade['Capital'].iloc[i-1])-1
df_trade1['Difs']=0
for i in range(0,len(df_trade1['Capital'])-1):
if i==0:
df_trade1['Difs'].iloc[0]=0
else:
df_trade1['Difs'].iloc[i]=(df_trade1['Capital'].iloc[i]/df_trade1['Capital'].iloc[i-1])-1
print('__________________________________________________________________________________________________________')
len_following=len(str("Avg Win Trade : %.3f%%" % df_trade[df_trade['Difs']>0.0]['Difs'].mean()))
print("Avg Win Trade : %.3f%%" % df_trade[df_trade['Difs']>0.0]['Difs'].mean(),
" "*(total_dist-len_following),
"Avg Win Trade : %.3f%%" % df_trade1[df_trade1['Difs']>0.0]['Difs'].mean()
)
print('__________________________________________________________________________________________________________')
len_following=len(str("Avg Loss Trade : %.3f%%" % df_trade[df_trade['Difs']<0.0]['Difs'].mean()))
print("Avg Loss Trade : %.3f%%" % df_trade[df_trade['Difs']<0.0]['Difs'].mean(),
" "*(total_dist-len_following),
"Avg Loss Trade : %.3f%%" % df_trade1[df_trade1['Difs']<0.0]['Difs'].mean()
)
print('__________________________________________________________________________________________________________')
len_following=len(("Largest Win Trade : %.3f%%" % df_trade[df_trade['Difs']>0.0]['Difs'].max()))
print("Largest Win Trade : %.3f%%" % df_trade[df_trade['Difs']>0.0]['Difs'].max(),
" "*(total_dist-len_following),
"Largest Win Trade : %.3f%%" % df_trade1[df_trade1['Difs']>0.0]['Difs'].max()
)
print('__________________________________________________________________________________________________________')
len_following=len(("Largest Loss Trade : %.3f%%" % df_trade[df_trade['Difs']<0.0]['Difs'].min()))
print("Largest Loss Trade : %.3f%%" % df_trade[df_trade['Difs']<0.0]['Difs'].min(),
" "*(total_dist-len_following),
"Largest Loss Trade : %.3f%%" % df_trade1[df_trade1['Difs']<0.0]['Difs'].min()
)
print('__________________________________________________________________________________________________________')
len_following=len(("Profit Factor : %.2f" % abs(df_trade[df_trade['Difs']>0.0]['Difs'].sum()
/df_trade[df_trade['Difs']<0.0]['Difs'].sum())))
print("Profit Factor : %.2f" % abs(df_trade[df_trade['Difs']>0.0]['Difs'].sum()
/df_trade[df_trade['Difs']<0.0]['Difs'].sum()),
" "*(total_dist-len_following),
"Profit Factor : %.2f" % abs(df_trade1[df_trade1['Difs']>0.0]['Difs'].sum()
/df_trade1[df_trade1['Difs']<0.0]['Difs'].sum()),
)
print('__________________________________________________________________________________________________________')
len_following=len(("Trading days : %.0f" % abs(len(df_trade['Capital']))))
print(("Trading days : %.0f" % abs(len(df_trade['Capital']))),
" "*(total_dist-len_following),
("Trading days : %.0f" % abs(len(df_trade1['Capital'])))
)
print('__________________________________________________________________________________________________________')
len_following=len(("Portfolio value ($) : %.0f" % abs(df_trade['Capital'].iloc[-1])))
print(("Portfolio value ($) : %.0f" % abs(df_trade['Capital'].iloc[-1])),
" "*(total_dist-len_following),
("Portfolio value ($) : %.0f" % abs(df_trade1['Capital'].iloc[-1]))
)
print('__________________________________________________________________________________________________________')
#Plot distributions trading algo vs Historical
fig = plt.figure(figsize=(13, 7))
plt.subplot(2, 2,1)
plt.hist(df_trade['Difs'],bins=35,color='#1286d6')
plt.xlim(-0.025,0.025)
plt.ylim(0,120)
plt.axvline(0, color='#34402d', linestyle='dashed', linewidth=1.5)
plt.title('Histogram: Predictor Returns on JPY')
plt.ylabel('Frequency')
plt.xlabel('Return in %')
plt.subplot(2, 2,2)
plt.hist(df_technical['Return Nominal'][train_size:-1],bins=50, color='#57bf22')
plt.xlim(-0.025,0.025)
plt.ylim(0,400)
plt.axvline(0, color='#34402d', linestyle='dashed', linewidth=1.5)
plt.title('Histogram: JPY Returns')
plt.ylabel('Frequency')
plt.xlabel('Return in %')
plt.subplot(2, 2,3)
plt.hist(df_trade1['Difs'],bins=20,color='#1286d6')
plt.xlim(-0.025,0.025)
plt.ylim(0,25)
plt.axvline(0, color='#34402d', linestyle='dashed', linewidth=1.5)
plt.title('Histogram: Predictor Returns on EUR')
plt.ylabel('Frequency')
plt.xlabel('Return in %')
plt.subplot(2, 2,4)
plt.hist(df_technical_1['Return Nominal'][train_size_1:-1],bins=50, color='#57bf22')
plt.xlim(-0.025,0.025)
plt.ylim(0,150)
plt.axvline(0, color='#34402d', linestyle='dashed', linewidth=1.5)
plt.title('Histogram: EUR Returns')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
Refer to above.
Our portfolio of 10k USD performs very good against the JPY, we see that 72% of our operations are profitable ('precision' metric) and our distribution is positively skewed compared to the historical returns. After 2423 trading days, our portfolio value is 132k USD.
On the other hand, the performance against the EUR is not that successful. We make profit 62% of the times we enter a trade. Fortunately, the returns are positively skewed, meaning that in average our profit trades are larger than our loss trades and that, eventually, reflects in an overall positive return. With a final portfolio value of 12k USD after 954 trading days. We can see it again following what the metrics tell us. We saw a very low 'recall', thus, we expect to miss many positive trends. However, our precision is above baselines, what means that when we enter a trade we are quite certain that is going to be positive. That is why we train our algorithm with 3 days horizon, to surely know that when we enter a buy trade, the trend will be strong enough to deliver positive return.
The algorithm in general terms performs optimally, specially considering a historical negative skewness of both currencies in terms of returns.
To finalize our analysis, we can visualize our performance in both currencies:
#Plot the backtest with our predictions and an initial capital of 10k USD
fig = plt.figure(figsize=(13, 14))
plt.subplot(3, 1,1)
plt.plot(df_trade['Capital'],lw=1.5,zorder=15,color='#0a2bb1')
plt.title('Backtest with $10000 initial capital against JPY')
plt.xlabel('Trades')
plt.ylabel('Capital (USD)')
plt.xlim(0,len(df_trade['Capital']))
#Plot green if pos, red if neg
for r in range(0,len(df_trade['Difs'])):
if df_trade['Difs'][r]>0:
plt.axvline(x=r, linewidth=0.5, alpha=0.7, color='g')
elif df_trade['Difs'][r]<0:
plt.axvline(x=r, linewidth=0.5, alpha=0.7, color='r')
plt.subplot(3, 1,2)
plt.plot(df_trade['Capital'][:len(df_trade1['Capital'])],lw=1.5,zorder=15,color='#0a2bb1')
plt.title('Backtest with $10000 initial capital against JPY \n Same trading days as EUR')
plt.xlabel('Trades')
plt.ylabel('Capital (USD)')
plt.xlim(0,len(df_trade['Capital'][:len(df_trade1['Capital'])]))
for r in range(0,len(df_trade['Difs'])):
if df_trade['Difs'][r]>0:
plt.axvline(x=r, linewidth=0.5, alpha=0.7, color='g')
elif df_trade['Difs'][r]<0:
plt.axvline(x=r, linewidth=0.5, alpha=0.7, color='r')
plt.subplot(3, 1,3)
plt.plot(df_trade1['Capital'],lw=1.5)
plt.title('Backtest with $10000 initial capital against EUR')
plt.xlabel('Trades')
plt.ylabel('Capital (USD)')
plt.xlim(0,len(df_trade1['Capital']))
for r in range(0,len(df_trade1['Difs'])):
if df_trade1['Difs'][r]>0:
plt.axvline(x=r, linewidth=0.5, alpha=0.7, color='g')
elif df_trade1['Difs'][r]<0:
plt.axvline(x=r, linewidth=0.5, alpha=0.7, color='r')
plt.tight_layout()
plt.show()
In the graphs above we can observe how few trades we execute (see green and red lines) in the EUR/USD pair compared to the JPY/USD. This is due to our low 'recall' with the EUR, staying conservative and entering the market only when we are certain of a positive return.
After an exploratory analysis of what machine learning can offer in terms of pattern recognition, we may look towards potential improvements to guarantee robustness in our predictions. This said, cross-validation for time series techniques seeking a better tuning and a more solid performance on unseen data can be included as a major improvement. Furthermore, the inclusion of more informative features such as volumes, averages or fundamental shocks may as well improve its performance.
Thank you for reading it!